Co-Segmentation for Fine Grained Visual Categorization

نویسندگان

  • Yuning Chai
  • Victor Lempitsky
  • Andrew Zisserman
چکیده

In this extended abstract we review our works [1, 2] on fine-grained visual classification (FGVC) and present the most recent results of our classification pipeline. In particular, we focus on the importance of the foreground segmentation, and show that accurate segmentation of training images is highly beneficial for the accuracy of classification at test time. We demonstrate the merit of relatively simple, unsupervised, and scalable co-segmentation methods that can exploit the similarity of the foreground appearance within each class. By encouraging the obtained foregrounds to contain class-discriminative information, the test-time classification accuracy can be improved even further. The experimental evaluation on several popular FGVC datasets shows that the combination of the improved segmentation of the training images, and the use of the advanced feature encodings (Fisher vectors) achieves state-of-the-art FGVC accuracy. There is an abundant evidence that accurate foregroundbackground segmentation can benefit visual recognition. This is particularly true when image classification deals with subordinate visual categories, e.g. flower species [5] or bird species [10]. In this scenario, background visual features tend to be similar across categories and typically act as a distraction to statistical learning rather than as useful context. Removing background at training time therefore gives a boost to the classification performance. Training a segmenter often requires pixel-level foreground annotation, which are expensive to obtain, therefore, we introduced BiCoS [1], a co-segmentation method that was shown to improve classification accuracies on various benchmark finegrained visual categorization datasets, without using any extra annotations (except for the image-level category labels). BiCoS (for Bi-level Co-Segmentation) applies cosegmentation to sets of training images corresponding to individual classes (considering one class at a time). In a nutshell, BiCoS operates at two levels of representation: at the bottom level, it treats each image separately and uses the well-known GrabCut algorithm [8] applied to the RGB values of individual pixels, whereas at the top level a discriminative foreground-background classification is performed on high-dimensional descriptors of superpixels. The top layer operates on all images of the class jointly and propagates information about foreground and background appearance across the images. Fig. 1 discusses the steps of the algorithm. This algorithm scales linearly with the number of images. Unlike many co-segmentation algorithms BiCoS does not assume the similarity of global geometric shape throughout the image set which is important for classes undergoing out-of-plane and non-rigid deformations. While BiCoS is applied to each fine-grain category individually, our second method TriCoS [2] further improves the classification accuracy by considering all categories jointly. This method adds another layer to the processing (hence the name) that pushes the class-discriminative foreground superpixels to foreground and vice versa. To determine the discriminability of superpixels, the category classifiers are reestimated within the optimization loop. Classification pipeline. Once BiCoS is applied to the training images, we discard the information contained in the estimated background regions, and use foreground parts to encode the training images. During test-time a universal foreground-background classifier is used (which is trained based on the BiCoS results on all training classes). The result of this classifier is applied at the superpixel level and is further improved with GrabCut (in a similar manner to the last step of the BiCoS algorithm in Fig. 1). Thus, both at train and at test time we consider the foreground regions and discard the background. The foreground region is represented with (a) an LLC-encoded [9] color histogram, and (b) an improved Fisher vector [7] based on densely sampled SIFTs. These two vectors are concatenated to get a descriptor for each image. We use a standard 1-vsall linear SVM for the final classification. Comparison and discussion. In Tab. 1, we extensively compare FGVC accuracies for the same classification pipeline defined above, when different foreground segmentation approaches using different levels of annotation at train time and at test time are employed. The comparison reveals the usefulness of the proposed cosegmentation algorithms in the regime when bounding box segmentation is not known at test time. For all three datasets in our consideration, cosegmentation approaches outperform the approach that segments each training image separately. Furthermore, perhaps surprisingly, unsupervised co-segmentation

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fine-grained Categorization - Short Summary of our Entry for the ImageNet Challenge 2012

In this paper, we tackle the problem of visual categorization of dog breeds, which is a surprisingly challenging task due to simultaneously present low interclass distances and high intra-class variances. Our approach combines several techniques well known in our community but often not utilized for fine-grained recognition: (1) automatic segmentation, (2) efficient part detection, and (3) comb...

متن کامل

Visual Representations for Fine-grained Categorization

Visual Representations for Fine-grained Categorization

متن کامل

Integrating Randomization and Discrimination for Classifying Human-Object Interaction Activities

Psychologists have shown that the ability of humans to perform basic-level categorization (e.g. cars vs. dogs; kitchen vs. highway) develops well before their ability to perform subordinate-level categorization, or fine-grained visual categorization (e.g. distinguishing dog breeds such as Golden retrievers vs. Labradors) [18]. It is interesting to observe that computer vision research has follo...

متن کامل

Attention for Fine-Grained Categorization

This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, beginning with fine-grained categorization on the Stanford Dogs data set. In this work we use an RNN of the same structure but substitute a more powerful visual network and perform large-scale pre-training of the visual network outside of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013